Overview

Dataset statistics

Number of variables13
Number of observations7959
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory754.1 KiB
Average record size in memory97.0 B

Variable types

Numeric9
Categorical3
Boolean1

Warnings

isLastSmallPeriod has constant value "False" Constant
startDate has a high cardinality: 1698 distinct values High cardinality
endDate has a high cardinality: 1673 distinct values High cardinality
minDate has a high cardinality: 1560 distinct values High cardinality
df_index has unique values Unique
l_bias has unique values Unique

Reproduction

Analysis started2021-01-10 15:13:08.501646
Analysis finished2021-01-10 15:13:18.755232
Duration10.25 seconds
Software versionpandas-profiling v2.10.0
Download configurationconfig.yaml

Variables

df_index
Real number (ℝ≥0)

UNIQUE

Distinct7959
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean10437.68891
Minimum0
Maximum20998
Zeros1
Zeros (%)< 0.1%
Memory size62.3 KiB

Quantile statistics

Minimum0
5-th percentile1041.9
Q15116.5
median10391
Q315683
95-th percentile20013.1
Maximum20998
Range20998
Interquartile range (IQR)10566.5

Descriptive statistics

Standard deviation6101.733843
Coefficient of variation (CV)0.5845866741
Kurtosis-1.216200541
Mean10437.68891
Median Absolute Deviation (MAD)5285
Skewness0.01343791213
Sum83073566
Variance37231155.89
MonotocityStrictly increasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01
 
< 0.1%
7891
 
< 0.1%
130991
 
< 0.1%
180871
 
< 0.1%
90011
 
< 0.1%
110481
 
< 0.1%
192361
 
< 0.1%
130911
 
< 0.1%
151381
 
< 0.1%
48951
 
< 0.1%
Other values (7949)7949
99.9%
ValueCountFrequency (%)
01
< 0.1%
81
< 0.1%
101
< 0.1%
181
< 0.1%
231
< 0.1%
ValueCountFrequency (%)
209981
< 0.1%
209941
< 0.1%
209931
< 0.1%
209891
< 0.1%
209841
< 0.1%

code
Real number (ℝ≥0)

Distinct296
Distinct (%)3.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean367889.7845
Minimum1
Maximum603993
Zeros0
Zeros (%)0.0%
Memory size62.3 KiB

Quantile statistics

Minimum1
5-th percentile538
Q12271
median600066
Q3600809
95-th percentile601919
Maximum603993
Range603992
Interquartile range (IQR)598538

Descriptive statistics

Standard deviation280916.49
Coefficient of variation (CV)0.763588721
Kurtosis-1.707223803
Mean367889.7845
Median Absolute Deviation (MAD)1752
Skewness-0.4551044012
Sum2928034795
Variance7.891407438 × 1010
MonotocityIncreasing
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
30001560
 
0.8%
60051960
 
0.8%
205060
 
0.8%
223656
 
0.7%
60067455
 
0.7%
227155
 
0.7%
66155
 
0.7%
60080955
 
0.7%
247555
 
0.7%
231154
 
0.7%
Other values (286)7394
92.9%
ValueCountFrequency (%)
125
0.3%
232
0.4%
6344
0.6%
6929
0.4%
10035
0.4%
ValueCountFrequency (%)
60399323
0.3%
60398615
0.2%
60389929
0.4%
60383310
 
0.1%
60379916
0.2%

lossRate
Real number (ℝ)

Distinct7762
Distinct (%)97.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean-0.05721130744
Minimum-0.3984575835
Maximum-0.0001040907671
Zeros0
Zeros (%)0.0%
Memory size62.3 KiB

Quantile statistics

Minimum-0.3984575835
5-th percentile-0.1409787552
Q1-0.07695414302
median-0.04684975767
Q3-0.02641694786
95-th percentile-0.00842167371
Maximum-0.0001040907671
Range0.3983534928
Interquartile range (IQR)0.05053719516

Descriptive statistics

Standard deviation0.0431035406
Coefficient of variation (CV)-0.7534094662
Kurtosis3.816888651
Mean-0.05721130744
Median Absolute Deviation (MAD)0.02382864142
Skewness-1.544146049
Sum-455.3447959
Variance0.001857915212
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
-0.060606060614
 
0.1%
-0.038461538464
 
0.1%
-0.059701492544
 
0.1%
-0.066666666674
 
0.1%
-0.029411764713
 
< 0.1%
-0.083333333333
 
< 0.1%
-0.020408163273
 
< 0.1%
-0.042857142863
 
< 0.1%
-0.038461538463
 
< 0.1%
-0.13
 
< 0.1%
Other values (7752)7925
99.6%
ValueCountFrequency (%)
-0.39845758351
< 0.1%
-0.39462036091
< 0.1%
-0.34383775351
< 0.1%
-0.33040147851
< 0.1%
-0.31737193761
< 0.1%
ValueCountFrequency (%)
-0.00010409076711
< 0.1%
-0.00016291951781
< 0.1%
-0.00029949086551
< 0.1%
-0.00034048348661
< 0.1%
-0.0003570790931
< 0.1%

startDate
Categorical

HIGH CARDINALITY

Distinct1698
Distinct (%)21.3%
Missing0
Missing (%)0.0%
Memory size62.3 KiB
2019-02-25
 
56
2015-03-30
 
52
2014-12-08
 
46
2015-04-07
 
46
2015-03-23
 
41
Other values (1693)
7718 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters79590
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique426 ?
Unique (%)5.4%

Sample

1st row2011-05-03
2nd row2013-01-22
3rd row2013-02-01
4th row2014-07-03
5th row2014-11-11
ValueCountFrequency (%)
2019-02-2556
 
0.7%
2015-03-3052
 
0.7%
2014-12-0846
 
0.6%
2015-04-0746
 
0.6%
2015-03-2341
 
0.5%
2015-05-2638
 
0.5%
2014-12-2238
 
0.5%
2015-04-1334
 
0.4%
2015-04-1433
 
0.4%
2014-08-0432
 
0.4%
Other values (1688)7543
94.8%
Histogram of lengths of the category
ValueCountFrequency (%)
2019-02-2556
 
0.7%
2015-03-3052
 
0.7%
2014-12-0846
 
0.6%
2015-04-0746
 
0.6%
2015-03-2341
 
0.5%
2015-05-2638
 
0.5%
2014-12-2238
 
0.5%
2015-04-1334
 
0.4%
2015-04-1433
 
0.4%
2014-08-0432
 
0.4%
Other values (1688)7543
94.8%

Most occurring characters

ValueCountFrequency (%)
018531
23.3%
-15918
20.0%
114452
18.2%
213667
17.2%
42944
 
3.7%
72690
 
3.4%
32688
 
3.4%
52561
 
3.2%
92293
 
2.9%
61993
 
2.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number63672
80.0%
Dash Punctuation15918
 
20.0%

Most frequent character per category

ValueCountFrequency (%)
018531
29.1%
114452
22.7%
213667
21.5%
42944
 
4.6%
72690
 
4.2%
32688
 
4.2%
52561
 
4.0%
92293
 
3.6%
61993
 
3.1%
81853
 
2.9%
ValueCountFrequency (%)
-15918
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common79590
100.0%

Most frequent character per script

ValueCountFrequency (%)
018531
23.3%
-15918
20.0%
114452
18.2%
213667
17.2%
42944
 
3.7%
72690
 
3.4%
32688
 
3.4%
52561
 
3.2%
92293
 
2.9%
61993
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII79590
100.0%

Most frequent character per block

ValueCountFrequency (%)
018531
23.3%
-15918
20.0%
114452
18.2%
213667
17.2%
42944
 
3.7%
72690
 
3.4%
32688
 
3.4%
52561
 
3.2%
92293
 
2.9%
61993
 
2.5%

endDate
Categorical

HIGH CARDINALITY

Distinct1673
Distinct (%)21.0%
Missing0
Missing (%)0.0%
Memory size62.3 KiB
2015-05-27
 
49
2015-04-07
 
43
2019-04-01
 
42
2019-02-25
 
42
2014-12-04
 
39
Other values (1668)
7744 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters79590
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique422 ?
Unique (%)5.3%

Sample

1st row2011-05-20
2nd row2013-01-29
3rd row2013-03-01
4th row2014-07-29
5th row2014-11-28
ValueCountFrequency (%)
2015-05-2749
 
0.6%
2015-04-0743
 
0.5%
2019-04-0142
 
0.5%
2019-02-2542
 
0.5%
2014-12-0439
 
0.5%
2015-01-0538
 
0.5%
2015-05-2636
 
0.5%
2016-08-1536
 
0.5%
2015-04-0835
 
0.4%
2015-04-1431
 
0.4%
Other values (1663)7568
95.1%
Histogram of lengths of the category
ValueCountFrequency (%)
2015-05-2749
 
0.6%
2015-04-0743
 
0.5%
2019-04-0142
 
0.5%
2019-02-2542
 
0.5%
2014-12-0439
 
0.5%
2015-01-0538
 
0.5%
2015-05-2636
 
0.5%
2016-08-1536
 
0.5%
2015-04-0835
 
0.4%
2015-04-1431
 
0.4%
Other values (1663)7568
95.1%

Most occurring characters

ValueCountFrequency (%)
018728
23.5%
-15918
20.0%
114347
18.0%
213529
17.0%
42870
 
3.6%
52757
 
3.5%
72600
 
3.3%
32588
 
3.3%
92386
 
3.0%
61963
 
2.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number63672
80.0%
Dash Punctuation15918
 
20.0%

Most frequent character per category

ValueCountFrequency (%)
018728
29.4%
114347
22.5%
213529
21.2%
42870
 
4.5%
52757
 
4.3%
72600
 
4.1%
32588
 
4.1%
92386
 
3.7%
61963
 
3.1%
81904
 
3.0%
ValueCountFrequency (%)
-15918
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common79590
100.0%

Most frequent character per script

ValueCountFrequency (%)
018728
23.5%
-15918
20.0%
114347
18.0%
213529
17.0%
42870
 
3.6%
52757
 
3.5%
72600
 
3.3%
32588
 
3.3%
92386
 
3.0%
61963
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII79590
100.0%

Most frequent character per block

ValueCountFrequency (%)
018728
23.5%
-15918
20.0%
114347
18.0%
213529
17.0%
42870
 
3.6%
52757
 
3.5%
72600
 
3.3%
32588
 
3.3%
92386
 
3.0%
61963
 
2.5%

minDate
Categorical

HIGH CARDINALITY

Distinct1560
Distinct (%)19.6%
Missing0
Missing (%)0.0%
Memory size62.3 KiB
2015-05-07
 
76
2014-12-24
 
62
2014-08-28
 
61
2015-04-20
 
58
2014-09-22
 
52
Other values (1555)
7650 

Length

Max length10
Median length10
Mean length10
Min length10

Characters and Unicode

Total characters79590
Distinct characters11
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique450 ?
Unique (%)5.7%

Sample

1st row2011-05-16
2nd row2013-01-25
3rd row2013-02-22
4th row2014-07-21
5th row2014-11-18
ValueCountFrequency (%)
2015-05-0776
 
1.0%
2014-12-2462
 
0.8%
2014-08-2861
 
0.8%
2015-04-2058
 
0.7%
2014-09-2252
 
0.7%
2014-12-0951
 
0.6%
2015-04-1551
 
0.6%
2019-03-2849
 
0.6%
2015-04-0949
 
0.6%
2019-03-0847
 
0.6%
Other values (1550)7403
93.0%
Histogram of lengths of the category
ValueCountFrequency (%)
2015-05-0776
 
1.0%
2014-12-2462
 
0.8%
2014-08-2861
 
0.8%
2015-04-2058
 
0.7%
2014-09-2252
 
0.7%
2014-12-0951
 
0.6%
2015-04-1551
 
0.6%
2019-03-2849
 
0.6%
2015-04-0949
 
0.6%
2019-03-0847
 
0.6%
Other values (1550)7403
93.0%

Most occurring characters

ValueCountFrequency (%)
018207
22.9%
-15918
20.0%
114286
17.9%
213809
17.4%
72840
 
3.6%
42702
 
3.4%
32699
 
3.4%
52608
 
3.3%
92432
 
3.1%
82078
 
2.6%

Most occurring categories

ValueCountFrequency (%)
Decimal Number63672
80.0%
Dash Punctuation15918
 
20.0%

Most frequent character per category

ValueCountFrequency (%)
018207
28.6%
114286
22.4%
213809
21.7%
72840
 
4.5%
42702
 
4.2%
32699
 
4.2%
52608
 
4.1%
92432
 
3.8%
82078
 
3.3%
62011
 
3.2%
ValueCountFrequency (%)
-15918
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common79590
100.0%

Most frequent character per script

ValueCountFrequency (%)
018207
22.9%
-15918
20.0%
114286
17.9%
213809
17.4%
72840
 
3.6%
42702
 
3.4%
32699
 
3.4%
52608
 
3.3%
92432
 
3.1%
82078
 
2.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII79590
100.0%

Most frequent character per block

ValueCountFrequency (%)
018207
22.9%
-15918
20.0%
114286
17.9%
213809
17.4%
72840
 
3.6%
42702
 
3.4%
32699
 
3.4%
52608
 
3.3%
92432
 
3.1%
82078
 
2.6%

isLastSmallPeriod
Boolean

CONSTANT
REJECTED

Distinct1
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size7.9 KiB
False
7959 
ValueCountFrequency (%)
False7959
100.0%

smallPeriodDays
Real number (ℝ≥0)

Distinct64
Distinct (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean13.016585
Minimum6
Maximum82
Zeros0
Zeros (%)0.0%
Memory size62.3 KiB

Quantile statistics

Minimum6
5-th percentile6
Q17
median10
Q315
95-th percentile32
Maximum82
Range76
Interquartile range (IQR)8

Descriptive statistics

Standard deviation8.841983651
Coefficient of variation (CV)0.6792859765
Kurtosis6.272599062
Mean13.016585
Median Absolute Deviation (MAD)3
Skewness2.198417294
Sum103599
Variance78.18067489
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
61297
16.3%
71063
13.4%
8849
10.7%
9663
 
8.3%
10540
 
6.8%
11457
 
5.7%
12331
 
4.2%
13292
 
3.7%
14256
 
3.2%
15244
 
3.1%
Other values (54)1967
24.7%
ValueCountFrequency (%)
61297
16.3%
71063
13.4%
8849
10.7%
9663
8.3%
10540
6.8%
ValueCountFrequency (%)
821
< 0.1%
782
< 0.1%
761
< 0.1%
751
< 0.1%
741
< 0.1%

bigPeriodDays
Real number (ℝ≥0)

Distinct225
Distinct (%)2.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean108.6567408
Minimum11
Maximum363
Zeros0
Zeros (%)0.0%
Memory size62.3 KiB

Quantile statistics

Minimum11
5-th percentile34
Q164
median95
Q3140
95-th percentile232
Maximum363
Range352
Interquartile range (IQR)76

Descriptive statistics

Standard deviation62.05017287
Coefficient of variation (CV)0.5710660233
Kurtosis1.279586503
Mean108.6567408
Median Absolute Deviation (MAD)37
Skewness1.131802088
Sum864799
Variance3850.223953
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
67117
 
1.5%
70109
 
1.4%
66103
 
1.3%
6899
 
1.2%
7995
 
1.2%
5493
 
1.2%
7389
 
1.1%
5589
 
1.1%
10783
 
1.0%
12082
 
1.0%
Other values (215)7000
88.0%
ValueCountFrequency (%)
115
 
0.1%
126
0.1%
1312
0.2%
1414
0.2%
159
0.1%
ValueCountFrequency (%)
36320
0.3%
34118
0.2%
33221
0.3%
32920
0.3%
27435
0.4%

riseRange
Real number (ℝ≥0)

Distinct2471
Distinct (%)31.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.397605931
Minimum0.5118483412
Maximum6.843658966
Zeros0
Zeros (%)0.0%
Memory size62.3 KiB

Quantile statistics

Minimum0.5118483412
5-th percentile0.8983473816
Q11.02457956
median1.189349112
Q31.549192864
95-th percentile2.530394946
Maximum6.843658966
Range6.331810625
Interquartile range (IQR)0.5246133042

Descriptive statistics

Standard deviation0.6667616586
Coefficient of variation (CV)0.4770741481
Kurtosis21.95580931
Mean1.397605931
Median Absolute Deviation (MAD)0.2067837326
Skewness3.884252296
Sum11123.5456
Variance0.4445711094
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
123
 
0.3%
2.33422573621
 
0.3%
6.84365896620
 
0.3%
2.32451599620
 
0.3%
2.01781534919
 
0.2%
1.84578900118
 
0.2%
2.00861077918
 
0.2%
2.53039494617
 
0.2%
1.91900273917
 
0.2%
4.59444183617
 
0.2%
Other values (2461)7769
97.6%
ValueCountFrequency (%)
0.51184834122
< 0.1%
0.56513720971
< 0.1%
0.57664197271
< 0.1%
0.61096829481
< 0.1%
0.62075656661
< 0.1%
ValueCountFrequency (%)
6.84365896620
0.3%
6.7667230147
 
0.1%
5.6201708015
 
0.1%
5.34986797613
0.2%
5.329013512
0.2%

s_bias
Real number (ℝ)

Distinct7958
Distinct (%)> 99.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.102562336
Minimum-0.09284573052
Maximum1.106707499
Zeros0
Zeros (%)0.0%
Memory size62.3 KiB

Quantile statistics

Minimum-0.09284573052
5-th percentile0.01599387417
Q10.052735814
median0.084294713
Q30.1304552862
95-th percentile0.248355741
Maximum1.106707499
Range1.199553229
Interquartile range (IQR)0.07771947215

Descriptive statistics

Standard deviation0.08221286288
Coefficient of variation (CV)0.8015892194
Kurtosis15.31099321
Mean0.102562336
Median Absolute Deviation (MAD)0.03682441578
Skewness2.72368967
Sum816.2936325
Variance0.006758954823
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.052631578952
 
< 0.1%
0.16028916781
 
< 0.1%
0.10419209151
 
< 0.1%
0.049455991121
 
< 0.1%
0.070257075291
 
< 0.1%
0.11044200891
 
< 0.1%
0.11777488051
 
< 0.1%
0.084013007091
 
< 0.1%
0.1668792981
 
< 0.1%
0.083094937131
 
< 0.1%
Other values (7948)7948
99.9%
ValueCountFrequency (%)
-0.092845730521
< 0.1%
-0.076730892991
< 0.1%
-0.07289231941
< 0.1%
-0.065100342631
< 0.1%
-0.064748201441
< 0.1%
ValueCountFrequency (%)
1.1067074991
< 0.1%
1.0160611651
< 0.1%
0.93733256981
< 0.1%
0.87735525871
< 0.1%
0.86243386241
< 0.1%

m_bias
Real number (ℝ≥0)

Distinct7958
Distinct (%)> 99.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.07910327637
Minimum1.33226763 × 1015
Maximum0.9684967843
Zeros0
Zeros (%)0.0%
Memory size62.3 KiB

Quantile statistics

Minimum1.33226763 × 1015
5-th percentile0.004440756383
Q10.03109330823
median0.06369837651
Q30.1068745556
95-th percentile0.2044995751
Maximum0.9684967843
Range0.9684967843
Interquartile range (IQR)0.07578124733

Descriptive statistics

Standard deviation0.07022518631
Coefficient of variation (CV)0.8877658364
Kurtosis15.84828711
Mean0.07910327637
Median Absolute Deviation (MAD)0.03638174094
Skewness2.607508576
Sum629.5829767
Variance0.004931576792
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.33226763 × 10152
 
< 0.1%
0.049544634471
 
< 0.1%
0.0149433761
 
< 0.1%
0.014034543821
 
< 0.1%
0.31779597031
 
< 0.1%
0.077976420871
 
< 0.1%
0.17467700261
 
< 0.1%
0.037083873931
 
< 0.1%
0.077800407331
 
< 0.1%
0.0013957463551
 
< 0.1%
Other values (7948)7948
99.9%
ValueCountFrequency (%)
1.33226763 × 10152
< 0.1%
2.457998943 × 1051
< 0.1%
2.709219474 × 1051
< 0.1%
3.381051237 × 1051
< 0.1%
5.329993204 × 1051
< 0.1%
ValueCountFrequency (%)
0.96849678431
< 0.1%
0.94651857531
< 0.1%
0.87797337721
< 0.1%
0.8346029341
< 0.1%
0.70981677871
< 0.1%

l_bias
Real number (ℝ≥0)

UNIQUE

Distinct7959
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.06394346155
Minimum8.881784197 × 1016
Maximum0.6681156445
Zeros0
Zeros (%)0.0%
Memory size62.3 KiB

Quantile statistics

Minimum8.881784197 × 1016
5-th percentile0.00217332084
Q10.01894001911
median0.04842025859
Q30.09021202116
95-th percentile0.1810652068
Maximum0.6681156445
Range0.6681156445
Interquartile range (IQR)0.07127200205

Descriptive statistics

Standard deviation0.06036180163
Coefficient of variation (CV)0.9439870812
Kurtosis6.086380221
Mean0.06394346155
Median Absolute Deviation (MAD)0.03350406097
Skewness1.867215775
Sum508.9260105
Variance0.003643547097
MonotocityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.089505695141
 
< 0.1%
0.018217961961
 
< 0.1%
0.00058679377941
 
< 0.1%
0.0049830470831
 
< 0.1%
0.13263255071
 
< 0.1%
0.054916377051
 
< 0.1%
0.020246156471
 
< 0.1%
0.046030498541
 
< 0.1%
0.054576519941
 
< 0.1%
0.12882583331
 
< 0.1%
Other values (7949)7949
99.9%
ValueCountFrequency (%)
8.881784197 × 10161
< 0.1%
1.33226763 × 10151
< 0.1%
1.318768025 × 1061
< 0.1%
3.147732532 × 1061
< 0.1%
7.887741661 × 1061
< 0.1%
ValueCountFrequency (%)
0.66811564451
< 0.1%
0.5833364021
< 0.1%
0.50026950311
< 0.1%
0.46555149561
< 0.1%
0.45542443111
< 0.1%

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

df_indexcodelossRatestartDateendDateminDateisLastSmallPeriodsmallPeriodDaysbigPeriodDaysriseRanges_biasm_biasl_bias
001-0.0587272011-05-032011-05-202011-05-16False10270.8765090.0173940.0777760.002700
181-0.0649412013-01-222013-01-292013-01-25False6591.0647360.2337350.1384550.022700
2101-0.1475702013-02-012013-03-012013-02-22False16591.0647360.1985040.2014640.063554
3181-0.0411652014-07-032014-07-292014-07-21False18701.0440120.0151700.0413560.011473
4231-0.0585632014-11-112014-11-282014-11-18False14691.2786430.0725670.0116010.025239
5261-0.1083392014-12-082014-12-172014-12-15False8691.2786430.2866980.0917430.031115
6271-0.0839842014-12-172015-01-052014-12-24False12691.2786430.1844080.1489260.048943
7311-0.0100252015-04-012015-04-132015-04-02False8651.1180750.0701710.0269830.098919
8331-0.1070592015-04-162015-06-082015-05-18False37651.1180750.2162360.1243010.082291
9371-0.0079462016-07-042016-07-132016-07-08False8641.0351870.0209990.0006870.008565

Last rows

df_indexcodelossRatestartDateendDateminDateisLastSmallPeriodsmallPeriodDaysbigPeriodDaysriseRanges_biasm_biasl_bias
794920968603993-0.1307822015-05-272015-06-052015-05-29False81511.4865590.0991490.0993880.180459
795020977603993-0.0105262017-02-162017-02-232017-02-17False6551.0146130.1513760.0273090.012395
795120980603993-0.0433932017-07-032017-07-102017-07-04False6881.4247970.0802170.0377980.007462
795220981603993-0.0829492017-07-102017-07-192017-07-13False8881.4247970.2857990.1061900.001975
795320983603993-0.1032612017-07-212017-08-032017-07-27False10881.4247970.2560800.2035370.020847
795420984603993-0.1807232017-08-032017-08-312017-08-18False21881.4247970.1020140.2835640.056603
795520989603993-0.0700002018-03-052018-03-132018-03-08False7390.9149180.1850680.0387540.006990
795620993603993-0.0206192019-03-112019-03-182019-03-14False6390.8185570.0551510.1111600.001857
795720994603993-0.0949492019-03-182019-04-022019-03-27False12390.8185570.0400250.1329450.012879
795820998603993-0.2123142020-01-142020-02-142020-02-03False18430.8484160.1142650.1257490.011199